PLANT SCIENCE 



HYPOTHESIS AND THEORY ARTICLE 

published: 03 September 2014 
doi: 10.3389/fpls. 2014. 00437 



Soybean seed proteome rebalancing 



The soybean seed's protein content and composition are regulated by both genetics 
and physiology. Overt seed protein content is specified by the genotype's genetic 
framework and is selectable as a breeding trait. Within the genotype-specified protein 
content phenotype soybeans have the capacity to rebalance protein composition to create 
differing proteomes. Soybeans possess a relatively standardized proteome, but mutation or 
targeted engineering can induce large-scale proteome rebalancing. Proteome rebalancing 
shows that the output traits of seed content and composition result from two major types 
of regulation: genotype and post-transcriptional control of the proteome composition. 
Understanding the underlying mechanisms that specifies the seed proteome can enable 
engineering new phenotypes for the production of a high-quality plant protein source for 
food, feed, and industrial proteins. 
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SOYBEANS ARE A GLOBAL PROTEIN COMMODITY 

Among the global commodity of crops, soybean has an almost 
unique role, being high enough in protein content to provide the 
nitrogen (N) needed for efficient large-scale animal feed produc- 
tion. Soybeans possess economically valued oil and protein and is 
an archetype seed used to dissect the processes that specify seed 
compositional output traits. Over the past decade, considerable 
public and industry funds have been invested to create soybean 
community resources, including genomic, transcript, SNP and 
SSR maps, proteomics, as well as supported a broad range of 
bioactivity biochemical, nutritional, and agronomic projects. 

PROTEIN CONTENT AS A GENOTYPE 

The genome of soybean specifies the genetic framework for seed 
formation and maturation, and it controls the expression, mix, 
and timing for synthesis of the storage metabolite traits (Wilson, 
2004 for general information; Hartwig and Kilen, 1991; Wilcox and 
Shibles, 2001; Chung etal, 2003; Nichols etal, 2006; Bolon etal., 
2010). Plant breeding has shown that different soybean genotypes 
specif)^ a standardized, often line/cultivar-specific protein content 
(Wilson, 2004). The genetic program that produces seeds is simul- 
taneously manifested in the embryo, endosperm, and maternal 
plant. Genetic marker analysis, using SNPs, has identified QTLs 
that demonstrate the overt protein and oil content has a strong 
genetic determinant (see Diers etal., 1992; Cregan etal., 1999; 
Zhao-Ming etal., 2011 for examples); these traits have supported 
generations of breeders who have enhanced soybean as a crop 
(Brim and Burton, 1979; Carter etal, 1986; Cober and Voldeng, 
2000; Wilson, 2004). 

Of the three individual components that comprise seeds, two 
are reproductive progeny: the endosperm and the zygotic embryo, 
which result from the double fertilization and are enclosed in the 
maternal-origin seed coat that connects the maturing seed to the 
maternal plant. Each reproductive-phase soybean plant consists 
of a coordinated network of embryos and endosperms for the 
common goal of maximizing reproductive output. Historically, 



breeding programs have selected traits for enhancement of seed 
productivity, storage metabolite content, and important agro- 
nomic performance traits of the maternal plant. One way to 
view seed production by plants is as a population that produces 
(maternal) and distributes (maternal and endosperm) nutrient 
metabolites to the embryonic sink. During soybean seed devel- 
opment the endosperm undergoes progressive programmed cell 
death that is completed prior to the accumulation of stored 
metabolites. By the onset of protein and oil accumulation, only 
a single cell layer of aleurone remains from the endosperm that 
encapsulates the embryo, separating it from the inner surface of 
the maternal seed coat. But the physiological role, if any, of the 
aleurone in regulating nutrient flux to the developing embryo has 
not been investigated. Viewed in this way, the soybean plant's 
progeny are a population of aleurones and embryos interacting 
with the metabolite flux. The non-synchronized developing pop- 
ulation of seeds must both compete and synchronize with the 
common maternal nutrient source. To assure the mature seeds are 
nearly all equivalent in composition, independent of their partic- 
ular position on the maternal plant, their developmental program 
and physiological regulation must be coordinated along with the 
capacity of the maternal plant to nourish them. 

The embryo's genotype specifies maturation-stages (Fehr et al., 
1971) that are controlled by transcription factors that provide 
the developmental framework for storage substance accumu- 
lation (Hill and Breidenbach, 1974; Goldberg etal, 1981a,b; 
Mienke etal, 1981; Walling etal, 1986; Naito etal, 1988; 
Harada etal., 1989; Perez-Grau and Goldberg, 1989; Nielsen 
and Nam, 1999; To etal, 2006). A large number of soy- 
bean seed-specific DNA binding proteins have been identified 
(http://casp.rnet.missouri.edu/soydb/), and some of these have 
been shown to regulate specific seed maturation specific genes. 
(Chen etal, 1986; Jofuku etal., 1987; Allen etal, 1989; Lessard 
etal, 1991; Baumlein etal, 1992; Kwong etal, 2003; Wang 
etal., 2007). A key role of some transcription factors is to reg- 
ulate the metabolic and developmental processes that support 
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Storage substance accumulation (Kroj etal., 2003; Gutierrez etal., 
2007; Santos-Mendoza etal., 2008 for reviews). Understanding 
how cooperation between the embryo, endosperm, and maternal 
organs is integrated at the level of gene expression and cross- 
regulation of metabolism is important for creating models of the 
source-sink relationship of seed-fill. 

NUTRIENT SOURCE DEFINES SEED PROTEIN 
ACCUMULATION 

Whole plant physiological experiments demonstrate that nutrient 
distribution to seeds is highly regulated. The seed protein out- 
put trait is primarily regulated by controlling the composition of 
seeds, with the total number of seeds being a consequence of nutri- 
ent availability. The average size of seeds from small size plants 
compared with larger size plants, differs only a little, but the total 
number of seeds produced is directly related to the total biomass of 
the maternal plant and its mobilized metabolite source potential. 
In an agronomic context, this defines yield. From the perspective 
of the plant, the protein content genotype maximizes the potential 
of an individual seeds with the overall yield of seeds depending on 
the available biomass/growth conditions. 

Although the maternal plant, endosperm, and embryo func- 
tion in concert to form the seed, their metabolic interaction occurs 
without a direct, contiguous flow of nutrients, as each is apoplasti- 
cally isolated from the other (Thorne, 1980; Thorne and Rainbird, 
1983; Egh and Bruening, 2001; Patrick and Offler, 2001 for review). 
The metabolite flux from the maternal plant through the aleurone 
to the embryo results from coordinated secretion from the source 
and uptake by the sink, and this potentially determines the storage 
output trait (Borisjuk etal., 2003, 2004). In annual plants, such 
as soybean, the maternal plant must grow rapidly and produce 
nutrient-capture organs (i.e., roots and leaves). 

The relationship between the maternal plant (source) and the 
seed (sink) has been investigated with increasingly more sophisti- 
cated tools and concepts for the past 40 years. For soybean, early 
studies focused on accumulation of vegetative proteins, primarily 
in foliage, as the nitrogen store that is mobilized to the seed and 
determines the accumulation of storage substances. This model 
of resource acquisition parallels that of most other seed plants, 
where metabolites in foliage are later mobilized to the seed. The 
amount of carbon fixed during photosynthesis is highly respon- 
sive to the environment, and the maternal plant manages carbon 
flux so it can distribute nutrients based on their availability and 
the demand of the seed (endosperm/embryo) sink (see Fellows 
etal, 1979; Borchers-Zampini etal., 1980 for early examples). A 
number of studies have shown that leaf proteins, predominantly 
Rubisco, accumulate over time (Schaefer etal., 1981a,b). In addi- 
tion, soybean leaves accumulate a vegetative storage protein ( VSP), 
a member of the vacuolar acid phosphatase family (DeWald et al., 
1992; Staswick etal, 1994). VSP accumulation is highly respon- 
sive to nitrogen availability (Franceschi and Giaquinta, 1983; 
Franceschi etal., 1983; Staswick, 1989a,b), and it increases with 
depodding, i.e., removal of the seed sink. This observation led 
to proposals that VSP is a necessary adjunct that provides addi- 
tional nitrogen resources for the seed. By silencing the VSP gene, 
it was later shown that VSP does not appear to make a difference 
in soybean seed protein content (Staswick et al., 2001). 



Accumulated leaf proteins are mobilized by specific proteases 
(see Ragster and Chrispeels, 1979, 1981a,b) found in leaf cell vac- 
uoles and plastids; the enzymes mediate the hydrolysis of Rubisco 
and VSP as well as other less abundant leaf proteins. Removal of all 
maturing seeds, except in one portion of the plant, leads to redis- 
tribution of source leaf nutrient flux (Carlson and Brun, 1984) 
indicating there must be (unknown) feedback-regulation between 
the seed and the nutrient source that is manifested through a long 
distance signal. Systems biology approaches could determine how 
the source size/composition is regulated, and how its mobilization 
is coordinated with the draw of the sink. 

Leaf photosynthate and the products of protein hydrolysis 
produce a metabolite flux consisting predominantly of sucrose, 
glutamine and asparagine (Hsu etal, 1984; Rainbird etal., 1984; 
Krishnan etal., 2011); for nitrogen-fixing legumes in particular, 
there are also ureides from xylem fluid. With respect to the amino 
acid flux, the input from glutamine and asparagine has different 
characteristics. Skokut etal. (1982), using '^N-NMR showed that 
there is no discrimination between the amino and amide N of glu- 
tamine, but for Asn the amino N is incorporated into protein twice 
as efficiently as the amide N, indicating a key role of asparagine 
in transamination. Asn may also be directly incorporated into 
proteins, with dual labeled ^^C and '^N Asn being incorporated 
directly without scrambling the labels (Schaefer etal., 1981a,b). 
Within the seed, free Asn accounts for a larger fraction of amino 
acids (33-49%), with the fractional amount varying by genotype 
(Schaefer et al., 1981a,b). Asn dominates seed coat free amino acid 
efflux, assayed as apoplastic fluid, from the seed cup, i.e., the seed 
coat with embryo removed (Gifford and Thorne, 1985; Murray, 
1987; De Jong etal., 1997; Hernandez-Sebastia etal, 2005; Pan- 
durangan et al, 2012). It is unclear whether the tissue source used 
for these experiments was derived from the inner surface of the 
aleurone or the maternal seed coat, or a combination of both, 
since the aleurone often adheres to the inner side of the seed coat. 
This shows that distinct from the embryo, the seed coat (perhaps 
comprising aleurone and seed coat) has an amino acid compo- 
sition similar to the source (assayed as xylem sap), containing 
Asn as the dominant N-source, and is about 10-fold higher in 
abundance than Gin (Thorne and Rainbird, 1983; Lohaus etal., 
1998; Zhang etal., 2010; Krishnan etal, 2011; Pandurangan etal., 
2012). 

PROTEIN COMPOSITION PLASTICITY AND THE SEED 
PROTEIN CONTENT GENOTYPE 

The soybean seed protein output trait has two primary compo- 
nents: total protein content and the composition of individual 
proteins (the proteome). For soybeans, like many other seeds (Her- 
man and Larkins, 1999 for review), the two major storage proteins, 
glycinin (IIS legumin type) and conglycinin (7S vicilin type), 
dominate the proteome. The soybean seed proteome also includes 
many moderately abundant proteins that are bioactive and aller- 
genic, such as the Kunitz and Bowman-Birk trypsin inhibitors, 
lectin, P34 allergen, sucrose binding protein, urease, oleosins 
(Herman and Burks, 2011) and several thousand low abundance 
proteins, including enzymes that mediate metabolism, synthesize 
storage substances, and create the structural framework of the 
cell. The specific mix of proteins and each protein's abundance 
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within the proteome determines the total amino acid composition 
trait. 

Since the development of plant transformation techniques, 
there have been many attempts to express genes to induce accu- 
mulation of large quantities of foreign proteins in seeds. The 
goals of these projects were often to alter the nutritional quality 
of seeds, by increasing essential amino acids such as methion- 
ine in soybean, or to use the seeds as protein bioreactors. Even 
with strong storage protein promoters to regulate the transgene 
expression, the most frequent outcome of such experiments was 
to produce relatively small amounts of the heterologous protein 
(about 1% of total) and for the protein and amino acid com- 
position to be little altered compared to control. These minor 
composition changes occurred whether the protein was targeted 
to the vacuole or accreted in ER-derived protein bodies, suggesting 
the limit of protein accumulation is independent of its deposition 
site. The observed heterologous protein production, compared to 
the expectations of the engineering design, indicated there must 
be seed regulatory mechanisms that limit foreign protein pro- 
duction so as not to significantly alter the seed protein content 
phenotype. 

The converse experiment is to silence the intrinsic major stor- 
age protein genes and assess the impact on the seed's protein 
content. Kinney etal. (2001) showed total soybean protein con- 
tent was conserved after silencing conglycinin, which constitutes 
about 20% of the total seed protein. The resulting seeds accu- 
mulated more glycinin, which apparently compensated for the 



missing conglycinin. Of the five glycinin-genes in the soybean 
genome, the protein encoded by the glycinin A4 gene tends to 
accrete in the ER, producing protein (ER)-bodies (Herman and 
Schmidt, 2004; Herman, 2008), which are not normally found in 
soybean (Kinney etal., 2001). Mori etal. (2004) showed similar 
observations in a conglycinin mutant obtained from screening a 
collection that exhibited the same phenotype of "glycinin rebal- 
ancing," and for some of this protein to remain as proglycinin 
and to accumulate ER-derived protein bodies. Schmidt and Her- 
man (2008) showed that introducing a gene encoding a foreign 
protein into the conglycinin-glycinin rebalancing increased the 
accumulation of the heterologous protein. A GFP-HDEL gene, as 
a glycinin-gene mimic allele with a glycinin promoter and termi- 
nator, was constructed. The addition of the HDEL ER-retention 
sequence was intended to promote accretion of the protein in the 
ER to form ER- (protein) -bodies and mimic the accreting glycinin 
ORE by substituting the GFP-HDEL protein. The expression of 
GFP-HDEL in the standard cv Jack soybean background resulted 
in about 1% of the total seed protein accumulated as GFP in 
protein bodies, a level typical of experiments of heterologous pro- 
tein production in seeds. However, introgressing the GFP-HDEL 
glycinin mimic allele into the conglycinin-silenced line resulted 
in about eightfold increased accumulation of GFP as the glycinin 
mimic was utilized to compensate for the conglycinin shortfall 
(Figure 1). 

To further test regulation of the protein content genotype and 
its capacity to allow for proteome alterations, RNAi silencing of 
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FIGURE 1 {Variations of seed composition within the context of the 
protein content genotype are shown. The pie charts show the relative 
distribution of proteins comprising the seed proteome of the standard 
soybean line Jack, a line that silences p-conglycinin a, a', another line that 
silences both p-conglycinin and glycinin, and a line where GFP-HDEL 



forming ER-derived protein bodies mimic of glycinin in the background of 
p-conglycinin silencing. Each of these seeds contains about the same tota 
protein content while the proteome of each of these seeds varies. This 
demonstrates how the genomic framework controls the protein content 
genotype that can be rebalanced by protein composition plasticity. 
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both the glycinin and conglycinin storage proteins {SP- or storage 
protein minus) lines were created that eliminated over two thirds 
of the protein content of standard soybean seeds (Schmidt etal., 

2011) . SP- seeds exhibited a number of different phenotypes that 
included conserving the same seed protein content as the parental 
line, due to compensatory increases of other vacuolar proteins, 
including Kunitz Trypsin Inhibitor, Lectin, P34, and sucrose bind- 
ing protein (Figure 1). These compensating proteins accumulated 
at levels up to 1 IX more than normal, and each protein's increase 
occurred without a parallel increase in its steady-state RNA tran- 
script abundance. This suggests that the protein content trait is 
determined by genotype but the abundance of proteome members 
occurs at the translational level. 

FREE ASN IS AN INDICATOR OF ALTERED PROTEIN CONTENT 
AND COMPOSITION 

Free Asn is correlated with soybean seed protein content and com- 
position. One line of experimental evidence has shown that in 
soybean seed cultivars with a range of protein contents there 
is a positive correlation between high free Asn and high pro- 
tein content (Hernandez-Sebastia et al., 2005; Pandurangan et al., 

2012) . For the SP- storage protein silenced soybean, free Asn 
increased 5.8X over the standard type (Schmidt etal, 2011). Per- 
haps in response to the elevated free Asn, the steady-state transcript 
(RNAseq) level for asparaginase increased 6.5X over the con- 
ventional controls. In standard lines the asparaginase level was 
previously correlated with protein content in standard lines (Wan 
et al., 2006). Together, these observations suggest there is a correla- 
tion of protein increase, whether by genotype selection for higher 
protein content or by increased abundance of individual proteins 
within the context of the protein genotype, with changes in free 
Asn and asparaginase. This suggests the free Asn level is a nitrogen 
status indicator (Miller et al., 2008), either as a regulator of, or as 
a component of, the processes that specify protein content and 
composition. 

CULTURED SOMATIC AND ZYGOTIC EMBRYOS EXHIBIT AN 
EXCESSIVE GROWTH TRAIT 

Ex vivo zygotic embryo and somatic embryo cultures are often 
used as proxies for seed maturation; however, there are signifi- 
cant differences in metabolic behavior of embryos that form in 
vivo and in vitro (see Thompson et al., 1977; Obendorf et al, 1983, 
1984; Raper etal., 1984; Finer, 1988; Hayati etal, 1996; Santarem 
etal, 1997; Chanprame etal, 1998; Pipolo etal, 2004; Iyer etal., 
2008; Nishizawa and Ishimoto, 2009; Allen and Young, 2013). RNA 
expression profiling showed that somatic embryos produce a rel- 
atively standard set of seed-specific transcripts (Thibaud-Nissen 
etal., 2003). Ex vivo cultures exhibited fidelity with in planta 
seeds, but exhibit differences in the content of accumulated reserve 
substances (Pipolo etal, 2004). Gin has been shown to be an 
effective N-input source for these cultures (Saravitz and Raper, 
1995; Schmidt etal., 2005) and is often used as the experimen- 
tal N-source in nutrition-flux studies (He etal., 2011; Allen and 
Young, 2013; Truong et al, 2013, for recent examples), even though 
it is Asn that accounts for the large majority of the actual mater- 
nal source N in planta (Lea and Mitlin, 1980; Lohaus etal., 1998; 
Lima and Sodek, 2003). A recent paper by AUen and Young (2013) 



showed in cultured zygotic soybean embryos that '*C-Gln sup- 
plied 36-46% of the carbon of amino acids. In another study using 
somatic embryos, Truong etal. (2013) showed that increasing Gin 
in extrinsic culture media resulted in increased protein content, 
without greater oil content, showing that Gin is preferentially used 
to synthesize protein. This is consistent with older NMR observa- 
tions on '^C and '^N that showed the amino and amido N for Gin 
as well as the carbon, is non-discriminatory when incorporated 
into the protein sink (Schaefer et al, 1981b; Skokut et al., 1982). 

Taken together, these observations support a model where the 
maternal source supplies Asn (Pandurangan etal., 2012) as the 
N-source for zygotic embryos, but experimental ex vivo embryos 
can effectively use Gin. The difference between Asn and Gin may 
be important in the context of the morphological and composi- 
tional differences between in planta zygotic and cultured somatic 
and zygotic embryos. The media used for soybean culture varies, 
although Gin as the N-source dominates (Haga and Sodek, 1987), 
particularly in SHaM media (Schmidt etal., 2005), which was 
developed for transgenic embryo maturation (see Schmidt and 
Herman, 2008; Schmidt etal, 2011 for examples of its use). It is 
also used in some ex vivo nutritional studies (Truong et al., 2013). 
The media used for immature somatic embryo culture and trans- 
formation (Finer, 1988; Finer and McMuUen, 1991; Walker and 
Parrott, 2001) has Asn as the N source. Tissue culture embryos 
used for transformation and regeneration, freed from the phys- 
ical and metabolic constraints of the endosperm/aleurone and 
seed coat, exhibit aberrant growth (Figures 2 and 3), supporting 




FIGURE 2 I The comparison of the size and morphology of a mature 
seed and a mature somatic embryo is shown. Note that the somatic 
embryo can exceed the seed's size, and the axis is enlarged compared to 
the cotyledon. 
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FIGURE 3 I A comparison of the expression of the storage protein trait in 
somatic embryos and seeds is shown as a GFP-HDEL glycinin mimic 
allele expressed as a storage protein proxy. The somatic embryo in white 
ight (A) and UV fluorescence (B) shows that the primary GFP expression site 
is in the cotyledon that is reduced in size compared to cotyledons of seed. In 
seeds the cotyledons comprise the large majority of the seed's mass shown 



as a chipped seed illuminated with white light (C) and UV for GFP 
fluorescence (D).The comparative expression the same glycinin promoter- 
regulated GFP in somatic embryos and seeds shows that while somatic 
embryos are a good proxy for seed expression there are challenges in 
interpreting somatic embryo results as an accurate proxy for a seed's in 
planta protein content and composition. 



a regulatory role for the endosperm and/or seed coat in seed 
development (Garcia et al, 2003, 2005; Berger et al, 2006). In cul- 
ture, somatic embryos form "monster" embryos with an enlarged 
embryonic axis and diminished, sometimes fused, cotyledons. 
Somatic embryos grown in the SHaM media are deemed "healthy," 
(i.e., large, green, well-formed), often exceeding in size a fuUy 
formed seed. Similar observations are obtained by culturing 
immature zygotic cotyledons that enlarge to a size that exceeds that 
of in planta seed cotyledons. This suggests the more an embryo 
is fed, the larger it grows, even beyond the size in a standard 
seed. Cultured embryos favor the accumulation of carbonaceous 
over nitrogenous metabolites, yielding less protein per mass than 
zygotic embryos. For ex vivo zygotic and somatic embryos, the 
genotype-specific protein content and its proteome phenotype 
appears to be less regulated, and instead the storage substance 
accumulation appears to have a direct relationship with nutri- 
ent input. The differences between in planta and ex vivo embryo 
development and storage substance accumulation indicates the 
significance of the in planta circumstance of each seed as an 
interactive member of a larger population. 

SEED PROTEIN CONTENT AND ITS VARYING PROTEOME 

For seed crops, of which soybean is a prominent example, historic 
and modern breeding has selected for enhanced storage substance 
accumulation. Generations of breeders have established that pro- 
tein content is a genetically determined trait that can be selected. 
How protein content is regulated in relationship with protein com- 
position appears to be a multilevel process, with the genotype 
establishing protein content. The rebalancing of proteome occurs 
in both dicots and monocots as shown by observations on soybean 
and maize (Wu and Messing, 2014). From the perspective of the 
seed that has to establish the next generation of an annual plant, 
its capacity to make compositional choices in response to altered 
metabolic circumstances has selective advantages. Understanding 
the processes that control proteome plasticity within the context 
of the protein content phenotype can enable biotechnologists to 



create enhanced soybeans optimized for specific end uses, such as 
species-specific feed or as protein bioreactors. 
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